Skip to content

[DAG] SelectionDAG::canCreateUndefOrPoison - Mark AVGFLOORS and AVGCEILS as safe #148191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aabhinavg1
Copy link
Contributor

@aabhinavg1 aabhinavg1 commented Jul 11, 2025

This patch updates SelectionDAG::canCreateUndefOrPoison to indicate that ISD::AVGFLOORS and ISD::AVGCEILS do not introduce poison or undef values.

Alive2 Proofs for AVG Operations

Opcode Operation Type Alive2 Proof Link
AVGFLOORS Signed Floor Avg Alive2 Link
AVGCEILS Signed Ceil Avg Alive2 Link
AVGFLOORU Unsigned Floor Avg Alive2 Link
AVGCEILU Unsigned Ceil Avg Alive2 Link

These patterns are safe due to the use of sext i8 into i32, which ensures no signed overflow occurs. The arithmetic is done in the wider domain before truncating safely back to i8.

Includes test coverage to ensure correctness.

Fixes: #147696

@llvmbot llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels Jul 11, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-llvm-selectiondag

Author: None (aabhinavg1)

Changes

…ILS as safe

This patch updates SelectionDAG::canCreateUndefOrPoison to indicate that ISD::AVGFLOORS and ISD::AVGCEILS do not introduce poison or undef values.

This is formally verified using Alive2:

These patterns are safe due to the use of sext i8 into i32, which ensures no signed overflow occurs. The arithmetic is done in the wider domain before truncating safely back to i8.

Includes test coverage to ensure correctness.

Differential Revision: <fill this after uploading to Phabricator or GH>


Full diff: https://github.com/llvm/llvm-project/pull/148191.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+4)
  • (added) llvm/test/CodeGen/AArch64/avgfloor-u8.ll (+16)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index c1356239ad206..78f00809e3862 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5542,6 +5542,10 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
   case ISD::UADDSAT:
   case ISD::SSUBSAT:
   case ISD::USUBSAT:
+  case ISD::AVGFLOORS:
+  case ISD::AVGFLOORU:
+  case ISD::AVGCEILS:
+  case ISD::AVGCEILU:
   case ISD::MULHU:
   case ISD::MULHS:
   case ISD::SMIN:
diff --git a/llvm/test/CodeGen/AArch64/avgfloor-u8.ll b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
new file mode 100644
index 0000000000000..669f6bbad14a3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
@@ -0,0 +1,16 @@
+; RUN: llc < %s -march=arm64 -mcpu=apple-m1 | FileCheck %s
+
+; CHECK-LABEL: avg:
+; CHECK:       add
+; CHECK:       lsr
+; CHECK:       ret
+
+define zeroext i8 @avg(i8 noundef zeroext %a, i8 noundef zeroext %b) {
+entry:
+  %conv = zext i8 %a to i16
+  %conv1 = zext i8 %b to i16
+  %add = add nuw nsw i16 %conv1, %conv
+  %div3 = lshr i16 %add, 1
+  %conv2 = trunc nuw i16 %div3 to i8
+  ret i8 %conv2
+}

@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-backend-aarch64

Author: None (aabhinavg1)

Changes

…ILS as safe

This patch updates SelectionDAG::canCreateUndefOrPoison to indicate that ISD::AVGFLOORS and ISD::AVGCEILS do not introduce poison or undef values.

This is formally verified using Alive2:

These patterns are safe due to the use of sext i8 into i32, which ensures no signed overflow occurs. The arithmetic is done in the wider domain before truncating safely back to i8.

Includes test coverage to ensure correctness.

Differential Revision: <fill this after uploading to Phabricator or GH>


Full diff: https://github.com/llvm/llvm-project/pull/148191.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+4)
  • (added) llvm/test/CodeGen/AArch64/avgfloor-u8.ll (+16)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index c1356239ad206..78f00809e3862 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5542,6 +5542,10 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
   case ISD::UADDSAT:
   case ISD::SSUBSAT:
   case ISD::USUBSAT:
+  case ISD::AVGFLOORS:
+  case ISD::AVGFLOORU:
+  case ISD::AVGCEILS:
+  case ISD::AVGCEILU:
   case ISD::MULHU:
   case ISD::MULHS:
   case ISD::SMIN:
diff --git a/llvm/test/CodeGen/AArch64/avgfloor-u8.ll b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
new file mode 100644
index 0000000000000..669f6bbad14a3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
@@ -0,0 +1,16 @@
+; RUN: llc < %s -march=arm64 -mcpu=apple-m1 | FileCheck %s
+
+; CHECK-LABEL: avg:
+; CHECK:       add
+; CHECK:       lsr
+; CHECK:       ret
+
+define zeroext i8 @avg(i8 noundef zeroext %a, i8 noundef zeroext %b) {
+entry:
+  %conv = zext i8 %a to i16
+  %conv1 = zext i8 %b to i16
+  %add = add nuw nsw i16 %conv1, %conv
+  %div3 = lshr i16 %add, 1
+  %conv2 = trunc nuw i16 %div3 to i8
+  ret i8 %conv2
+}

@jayfoad
Copy link
Contributor

jayfoad commented Jul 11, 2025

Code change looks fine but I don't understand what the test you added actually shows.

Differential Revision:

Don't add this! We haven't used this since we moved away from Phabricator for code reviews.

@jayfoad
Copy link
Contributor

jayfoad commented Jul 11, 2025

Also, I don't understand how your alive2 links are relevant. They verify an expansion of avgfloors/avgceils, but what does this have to do with introducing poison?

@aabhinavg1
Copy link
Contributor Author

aabhinavg1 commented Jul 11, 2025

Hi @jayfoad
Thanks for pointing that out,

I included the Alive2 links to show that the lowering of AVGFLOORS and AVGCEILS avoids poison, but I get that it doesn't directly prove that the SelectionDAG op itself is safe.

If you don’t mind, could you suggest what kind of test cases would better show that these ops don’t introduce poison or undef?

@RKSimon
Copy link
Collaborator

RKSimon commented Jul 11, 2025

You need to remove the noundef and compare the effect of freezing the result vs the inputs:

----------------------------------------
define i8 @src(i8 %a, i8 %b) {
#0:
  %a32 = sext i8 %a to i32
  %b32 = sext i8 %b to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = ashr i32 %xor, 1
  %res = add i32 %and, %shr
  %trunc = trunc i32 %res to i8
  %freeze = freeze i8 %trunc
  ret i8 %freeze
}
=>
define i8 @tgt(i8 %a, i8 %b) {
#0:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = sext i8 %fa to i32
  %b32 = sext i8 %fb to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = ashr i32 %xor, 1
  %res = add i32 %and, %shr
  %trunc = trunc i32 %res to i8
  ret i8 %trunc
}
Transformation seems to be correct!

(if you remove either of the dst freezes it should fail)

@RKSimon RKSimon changed the title [DAG] SelectionDAG::canCreateUndefOrPoison - Mark AVGFLOORS and AVGCE… [DAG] SelectionDAG::canCreateUndefOrPoison - Mark AVGFLOORS and AVGCEILS as safe Jul 11, 2025
case ISD::AVGFLOORS:
case ISD::AVGFLOORU:
case ISD::AVGCEILS:
case ISD::AVGCEILU:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need test coverage for all 4 patterns

Copy link
Contributor Author

@aabhinavg1 aabhinavg1 Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @RKSimon

Thanks for the feedback!

Alive2 Proofs for AVG Operations

Opcode Operation Type Alive2 Proof Link
AVGFLOORS Signed Floor Avg Alive2 Link
AVGCEILS Signed Ceil Avg Alive2 Link
AVGFLOORU Unsigned Floor Avg Alive2 Link
AVGCEILU Unsigned Ceil Avg Alive2 Link

Let me know if need to add anything else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These proofs are weird: the whole point of the and/xor/ashr/add implementation is that you don't need to extend to a wider type, you can do all operations in i8.

@davemgreen
Copy link
Collaborator

Could you add a tests to llvm/test/CodeGen/AArch64/hadd-combine.ll that use the aarch64.neon.uhadd/urhadd intrinsics? They should be converted to these avg nodes so can be tested directly.

@aabhinavg1
Copy link
Contributor Author

Could you add a tests to llvm/test/CodeGen/AArch64/hadd-combine.ll that use the aarch64.neon.uhadd/urhadd intrinsics? They should be converted to these avg nodes so can be tested directly.

Hi @davemgreen,

Thanks for the suggestion!

I've drafted the following test cases in llvm/test/CodeGen/AArch64/hadd-combine.ll to exercise the aarch64.neon.uhadd and aarch64.neon.urhadd intrinsics:

; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+neon | FileCheck %s

declare <8 x i8> @llvm.aarch64.neon.uhadd.v8i8(<8 x i8>, <8 x i8>)
declare <8 x i8> @llvm.aarch64.neon.urhadd.v8i8(<8 x i8>, <8 x i8>)
declare <16 x i8> @llvm.aarch64.neon.uhadd.v16i8(<16 x i8>, <16 x i8>)
declare <16 x i8> @llvm.aarch64.neon.urhadd.v16i8(<16 x i8>, <16 x i8>)

; CHECK-LABEL: test_uhadd_v8i8:
; CHECK: uhadd v0.8b, v0.8b, v1.8b
define <8 x i8> @test_uhadd_v8i8(<8 x i8> %a, <8 x i8> %b) {
  %res = call <8 x i8> @llvm.aarch64.neon.uhadd.v8i8(<8 x i8> %a, <8 x i8> %b)
  ret <8 x i8> %res
}

; CHECK-LABEL: test_urhadd_v8i8:
; CHECK: urhadd v0.8b, v0.8b, v1.8b
define <8 x i8> @test_urhadd_v8i8(<8 x i8> %a, <8 x i8> %b) {
  %res = call <8 x i8> @llvm.aarch64.neon.urhadd.v8i8(<8 x i8> %a, <8 x i8> %b)
  ret <8 x i8> %res
}

; CHECK-LABEL: test_uhadd_v16i8:
; CHECK: uhadd v0.16b, v0.16b, v1.16b
define <16 x i8> @test_uhadd_v16i8(<16 x i8> %a, <16 x i8> %b) {
  %res = call <16 x i8> @llvm.aarch64.neon.uhadd.v16i8(<16 x i8> %a, <16 x i8> %b)
  ret <16 x i8> %res
}

; CHECK-LABEL: test_urhadd_v16i8:
; CHECK: urhadd v0.16b, v0.16b, v1.16b
define <16 x i8> @test_urhadd_v16i8(<16 x i8> %a, <16 x i8> %b) {
  %res = call <16 x i8> @llvm.aarch64.neon.urhadd.v16i8(<16 x i8> %a, <16 x i8> %b)
  ret <16 x i8> %res
}

Would this be acceptable for verifying that the uhadd/urhadd intrinsics get correctly lowered into the corresponding DAG avg patterns?

@aabhinavg1 aabhinavg1 self-assigned this Jul 11, 2025
@RKSimon
Copy link
Collaborator

RKSimon commented Jul 11, 2025

What could be done is to fold based on the knownbits/signbits result of a avgu/avgs node - if the freeze is moved then the knownbits/signbits calculation is exposed and can allow the fold to occur.

@EugeneZelenko EugeneZelenko removed their request for review July 11, 2025 13:24
@aabhinavg1
Copy link
Contributor Author

Hi @RKSimon

here is the test could you please look in to it if i have not missed anything

; RUN: opt -passes=instcombine -S < %s | FileCheck %s

; === Signed Floor Average (AVGFLOORS) ===

define i8 @avg_signed_floor_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_floor_freeze_before(
; CHECK: freeze i8
entry:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = sext i8 %fa to i32
  %b32 = sext i8 %fb to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = ashr i32 %xor, 1
  %sum = add i32 %and, %shr
  %res = trunc i32 %sum to i8
  ret i8 %res
}

define i8 @avg_signed_floor_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_floor_freeze_after(
; CHECK: freeze i8
entry:
  %a32 = sext i8 %a to i32
  %b32 = sext i8 %b to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = ashr i32 %xor, 1
  %sum = add i32 %and, %shr
  %res = trunc i32 %sum to i8
  %fr = freeze i8 %res
  ret i8 %fr
}

; === Signed Ceil Average (AVGCEILS) ===

define i8 @avg_signed_ceil_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_ceil_freeze_before(
; CHECK: freeze i8
entry:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = sext i8 %fa to i32
  %b32 = sext i8 %fb to i32
  %add = add i32 %a32, %b32
  %add1 = add i32 %add, 1
  %shr = ashr i32 %add1, 1
  %res = trunc i32 %shr to i8
  ret i8 %res
}

define i8 @avg_signed_ceil_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_ceil_freeze_after(
; CHECK: freeze i8
entry:
  %a32 = sext i8 %a to i32
  %b32 = sext i8 %b to i32
  %add = add i32 %a32, %b32
  %add1 = add i32 %add, 1
  %shr = ashr i32 %add1, 1
  %res = trunc i32 %shr to i8
  %fr = freeze i8 %res
  ret i8 %fr
}

; === Unsigned Floor Average (AVGFLOORU) ===

define i8 @avg_unsigned_floor_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_floor_freeze_before(
; CHECK: freeze i8
entry:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = zext i8 %fa to i32
  %b32 = zext i8 %fb to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = lshr i32 %xor, 1
  %sum = add i32 %and, %shr
  %res = trunc i32 %sum to i8
  ret i8 %res
}

define i8 @avg_unsigned_floor_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_floor_freeze_after(
; CHECK: freeze i8
entry:
  %a32 = zext i8 %a to i32
  %b32 = zext i8 %b to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = lshr i32 %xor, 1
  %sum = add i32 %and, %shr
  %res = trunc i32 %sum to i8
  %fr = freeze i8 %res
  ret i8 %fr
}

; === Unsigned Ceil Average (AVGCEILU) ===

define i8 @avg_unsigned_ceil_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_ceil_freeze_before(
; CHECK: freeze i8
entry:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = zext i8 %fa to i32
  %b32 = zext i8 %fb to i32
  %add = add i32 %a32, %b32
  %add1 = add i32 %add, 1
  %shr = lshr i32 %add1, 1
  %res = trunc i32 %shr to i8
  ret i8 %res
}

define i8 @avg_unsigned_ceil_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_ceil_freeze_after(
; CHECK: freeze i8
entry:
  %a32 = zext i8 %a to i32
  %b32 = zext i8 %b to i32
  %add = add i32 %a32, %b32
  %add1 = add i32 %add, 1
  %shr = lshr i32 %add1, 1
  %res = trunc i32 %shr to i8
  %fr = freeze i8 %res
  ret i8 %fr
}

I’ve added avg-poison-safety.ll with test coverage for all four ops:

  • AVGFLOORS (signed floor)
  • AVGCEILS (signed ceil)
  • AVGFLOORU (unsigned floor)
  • AVGCEILU (unsigned ceil)

Could you please review if I’ve missed anything?

%div3 = lshr i16 %add, 1
%conv2 = trunc nuw i16 %div3 to i8
ret i8 %conv2
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem you have if you attempt to freeze the expanded avg pattern - is that the freeze might get folded through instructions BEFORE the avg node is created. You're going to have to use the aarch64 hadd/rhadd intrinsics and rely on a fold that uses knownbits/signbits that is blocked if the freeze it still present and(avg(and(x,const0),and(y,const1)),const2) might work for avgceilu/avgflooru for instance.

llvm\test\CodeGen\X86\freeze-binary.ll has some examples of frozen calls of other intrinsics to give you some ideas that you could try for @llvm.aarch64.neon.shadd.v8i16 etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @RKSimon
I've now added a separate AArch64 test (avg-freeze.ll) that demonstrates freeze applied after the hadd/rhadd intrinsics (shadd, srhadd, uhadd, urhadd). This avoids blocking instruction selection and ensures correctness. The tests include:

  • freeze_shadd_vec
  • freeze_srhadd_vec
  • freeze_uhadd_vec
  • freeze_urhadd_vec

The logic is similar to how freeze-preserving intrinsic selection is handled in freeze-binary.ll for X86.

testcase
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s --check-prefixes=AARCH64

; This test verifies that applying `freeze` to the result of AArch64 hadd/rhadd intrinsics
; does not interfere with instruction selection.
; The intrinsic is computed first, then frozen. Since `freeze` is applied after,
; it does not block the selection of `shadd`, `srhadd`, `uhadd`, or `urhadd`.

declare <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.urhadd.v8i16(<8 x i16>, <8 x i16>)

define <8 x i16> @freeze_shadd_vec(<8 x i16> %a, <8 x i16> %b) {
; AARCH64-LABEL: freeze_shadd_vec:
; AARCH64:       shadd v0.8h, v0.8h, v1.8h
; AARCH64-NEXT:  ret
  %avg = call <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16> %a, <8 x i16> %b)
  %frozen = freeze <8 x i16> %avg
  ret <8 x i16> %frozen
}

define <8 x i16> @freeze_srhadd_vec(<8 x i16> %a, <8 x i16> %b) {
; AARCH64-LABEL: freeze_srhadd_vec:
; AARCH64:       srhadd v0.8h, v0.8h, v1.8h
; AARCH64-NEXT:  ret
  %avg = call <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16> %a, <8 x i16> %b)
  %frozen = freeze <8 x i16> %avg
  ret <8 x i16> %frozen
}

define <8 x i16> @freeze_uhadd_vec(<8 x i16> %a, <8 x i16> %b) {
; AARCH64-LABEL: freeze_uhadd_vec:
; AARCH64:       uhadd v0.8h, v0.8h, v1.8h
; AARCH64-NEXT:  ret
  %avg = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %a, <8 x i16> %b)
  %frozen = freeze <8 x i16> %avg
  ret <8 x i16> %frozen
}

define <8 x i16> @freeze_urhadd_vec(<8 x i16> %a, <8 x i16> %b) {
; AARCH64-LABEL: freeze_urhadd_vec:
; AARCH64:       urhadd v0.8h, v0.8h, v1.8h
; AARCH64-NEXT:  ret
  %avg = call <8 x i16> @llvm.aarch64.neon.urhadd.v8i16(<8 x i16> %a, <8 x i16> %b)
  %frozen = freeze <8 x i16> %avg
  ret <8 x i16> %frozen
}
llvm-lit report
./llvm-lit ../../llvm/test/CodeGen/AArch64/avg-freeze.ll

-- Testing: 1 tests, 1 workers --
PASS: LLVM :: CodeGen/AArch64/avg-freeze.ll (1 of 1)

Testing Time: 0.16s

Total Discovered Tests: 1
  Passed: 1 (100.00%)

Let me know if you'd like to see this added to a different file or want to keep a different name for the file coverage!
or i need to modify anything in the testcase

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I had in mind was something like this:

define <8 x i16> @src(<8 x i16> %a0, <8 x i16> %a1) {
  %m0 = and <8 x i16> %a0, splat (i16 15)
  %m1 = and <8 x i16> %a1, splat (i16 15)
  %avg = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
  %frozen = freeze <8 x i16> %avg
  %mask = and <8 x i16> %frozen, splat (i16 31)
  ret <8 x i16> %mask
}
define <8 x i16> @tgt(<8 x i16> %a0, <8 x i16> %a1) {
  %f0 = freeze <8 x i16> %a0
  %f1 = freeze <8 x i16> %a1
  %m0 = and <8 x i16> %f0, splat (i16 15)
  %m1 = and <8 x i16> %f1, splat (i16 15)
  %avg = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
  %mask = and <8 x i16> %avg, splat (i16 31)
  ret <8 x i16> %mask
}

But this doesn't work yet due to a missing support for AArch64ISD::MOVIshift in AArch64TargetLowering::computeKnownBitsForTargetNode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the src/tgt version highlights a more interesting optimization around freeze placement and known bits. As you mentioned, it doesn't work yet because support for AArch64ISD::MOVIshift is missing in computeKnownBitsForTargetNode .

What you said makes sense we can look into adding that support in a follow-up patch. Once it's there, we can add the more advanced test to properly check how known bits interact with freeze.

For this patch, I suggest we go ahead with the current tests. They still serve the main purpose, which is to make sure that freeze after hadd/rhadd doesn't interfere with instruction selection.

Let me know if you're good with that. If yes, we can land this one and open a new patch for the MOVIshift support and the src/tgt test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once #148634 lands, you should be able to quickly remove avgfloor-u8.ll and add suitable test coverage to hadd-combine.ll similar to @src (call them something like @haddu_freeze/@rhaddu_freeze etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @RKSimon Is this test structure OK?
patterns into this unified hadd-freeze.ll after #148634 landed.

; RUN: llc < %s -march=arm64 -mattr=+neon | FileCheck %s

declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.urhadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16>, <8 x i16>)

;===---------------------------------------------------------------------===;
; Test: freeze does not block uhadd instruction selection
;===---------------------------------------------------------------------===;

define <8 x i16> @uhadd_freeze(<8 x i16> %a0, <8 x i16> %a1) {
; CHECK-LABEL: uhadd_freeze:
; CHECK: uhadd
; CHECK: and
  %m0 = and <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
  %m1 = and <8 x i16> %a1, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
  %avg = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
  %frozen = freeze <8 x i16> %avg
  %masked = and <8 x i16> %frozen, <i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31>
  ret <8 x i16> %masked
}

;===---------------------------------------------------------------------===;
; Test: freeze does not block urhadd instruction selection
;===---------------------------------------------------------------------===;

define <8 x i16> @urhadd_freeze(<8 x i16> %a0, <8 x i16> %a1) {
; CHECK-LABEL: urhadd_freeze:
; CHECK: urhadd
; CHECK: and
  %m0 = and <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
  %m1 = and <8 x i16> %a1, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
  %avg = call <8 x i16> @llvm.aarch64.neon.urhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
  %frozen = freeze <8 x i16> %avg
  %masked = and <8 x i16> %frozen, <i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31>
  ret <8 x i16> %masked
}

;===---------------------------------------------------------------------===;
; Test: freeze does not block shadd instruction selection
;===---------------------------------------------------------------------===;

define <8 x i16> @shadd_freeze(<8 x i16> %a0, <8 x i16> %a1) {
; CHECK-LABEL: shadd_freeze:
; CHECK: shadd
; CHECK: and
  %m0 = and <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
  %m1 = and <8 x i16> %a1, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
  %avg = call <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
  %frozen = freeze <8 x i16> %avg
  %masked = and <8 x i16> %frozen, <i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31>
  ret <8 x i16> %masked
}

;===---------------------------------------------------------------------===;
; Test: freeze does not block srhadd instruction selection
;===---------------------------------------------------------------------===;

define <8 x i16> @srhadd_freeze(<8 x i16> %a0, <8 x i16> %a1) {
; CHECK-LABEL: srhadd_freeze:
; CHECK: srhadd
; CHECK: and
  %m0 = and <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
  %m1 = and <8 x i16> %a1, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
  %avg = call <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
  %frozen = freeze <8 x i16> %avg
  %masked = and <8 x i16> %frozen, <i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31>
  ret <8 x i16> %masked
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a hadd-freeze.ll file should be fine - please get the patch updated to allow us to review the diffs. I'm expecting you will need to come up with a different approach for signed hadd tests as otherwise they will just fold to unsigned variants if they are known poisitive.

declare <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16>, <8 x i16>)

;===---------------------------------------------------------------------===;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these ;===-----'s. They don't add much.

…ILS as safe

This patch updates `SelectionDAG::canCreateUndefOrPoison` to indicate that
`ISD::AVGFLOORS` and `ISD::AVGCEILS` do not introduce poison or undef values.

| Opcode       | Operation Type    | Alive2 Proof Link |
|--------------|-------------------|--------------------|
| `AVGFLOORS`  | Signed Floor Avg  | [Alive2 Link](https://alive2.llvm.org/ce/z/Dwy8a5) |
| `AVGCEILS`   | Signed Ceil Avg   | [Alive2 Link](https://alive2.llvm.org/ce/z/_JKF8A) |
| `AVGFLOORU`  | Unsigned Floor Avg| [Alive2 Link](https://alive2.llvm.org/ce/z/2-B6RM) |
| `AVGCEILU`   | Unsigned Ceil Avg | [Alive2 Link](https://alive2.llvm.org/ce/z/t5WZ6K) |

These patterns are safe due to the use of `sext i8` into `i32`, which ensures
no signed overflow occurs. The arithmetic is done in the wider domain before
truncating safely back to `i8`.

Includes test coverage to ensure correctness.
@aabhinavg1 aabhinavg1 force-pushed the D147696_SelectionDAG branch from 6999bf3 to ae97069 Compare July 17, 2025 10:48
; CHECK-NEXT: and v1.16b, v1.16b, v2.16b
; CHECK-NEXT: movi v2.8h, #31
; CHECK-NEXT: uhadd v0.8h, v0.8h, v1.8h
; CHECK-NEXT: and v0.16b, v0.16b, v2.16b
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and should have gone - any idea why it hasn't?

; CHECK-NEXT: and v1.16b, v1.16b, v2.16b
; CHECK-NEXT: movi v2.8h, #31
; CHECK-NEXT: urhadd v0.8h, v0.8h, v1.8h
; CHECK-NEXT: and v0.16b, v0.16b, v2.16b
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and should have gone - any idea why it hasn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM must be conservative around freeze to avoid miscompilation. Instruction selection may proceed, but value range assumptions cannot be made post-freeze.

#58321

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it help if you apply #149323?

Copy link
Contributor Author

@aabhinavg1 aabhinavg1 Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I tested with PR [#149323]
After rebuilding and rerunning llc on the neon-hadd-freeze.ll test, the instruction:

and v1.16b, v1.16b, v2.16b

is still present in the output for both uhadd_freeze and urhadd_freeze.

So it seems the patch does not remove the redundant and as expected.

RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Jul 17, 2025
…des can't create undef/poison

Possible fix for failed fold in llvm#148191
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DAG] SelectionDAG::canCreateUndefOrPoison - add ISD::AVGFLOORS/AVGFLOORU/AVGCEILS/AVGCEILU handling + tests
5 participants